An Improved Model-based Speake

نویسندگان

  • Peng Yu
  • Frank Seide
  • Eric Chang
چکیده

In this paper, we report our recent work on speaker segmentation. Without a priori information about speaker number and speaker identities, the audio stream is segmented, and segments of the same speaker are grouped together. Speakers are represented by Gaussian Mixture Models (GMMs), then an HMM network is used for segmentation. However, unlike other model-based segmentation methods, the speaker GMMs are initialized using a simpler distance based segmentation algorithm. To group segments of identical speakers, a two-level clustering mechanism is introduced, which we found to achieve higher accuracy than direct distance based clustering methods. Our method significantly outperforms the best result reported at the 2002 Speaker Recognition Workshop. When tested on a professionally produced TV program set, our system reports only 3.5% frame errors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AN Improved UTD Based Model For The Multiple Building Diffraction Of Plane Waves In Urban Environments By Using Higher Order Diffraction Coeficients

This paper describes an improved model for multiple building diffraction modeling based on the uniform theory of diffraction (UTD). A well-known problem in conventional uniform theory of diffraction (CUTD) is multiple-edge transition zone diffraction. Here, higher order diffracted fields are used in order to improve the result; hence, we use higher order diffraction coefficients to improve a hy...

متن کامل

A Distance measure Between GMMs Based o its Application to Speake

This paper proposes a dissimilarity measure between two Gaussian mixture models (GMM). Computing a distance measure between two GMMs that were learned from speech segments is a key element in speaker verification, speaker segmentation and many other related applications. A natural measure between two distributions is the Kullback-Leibler divergence. However, it cannot be analytically computed i...

متن کامل

Designing a Speaker-discrim Filter Bank for Speake

A new filter bank approach for speaker recognition front-end is proposed. The conventional mel-scaled filter bank is replaced with a speaker-discriminative filter bank. Filter bank is selected from a library in adaptive basis, based on the broad phoneme class of the input frame. Each phoneme class is associated with its own filter bank. Each filter bank is designed in a way that emphasizes disc...

متن کامل

An improved opposition-based Crow Search Algorithm for Data Clustering

Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...

متن کامل

An Improved COCOMO based Model to Estimate the Effort of Software Projects

One of important aspects of software projects is estimating the cost and time required to develop projects. Nowadays, this issue has become one of the key concerns of project managers. Accurate estimation of essential effort to produce and develop software is heavily effective on success or failure of software projects and it is highly regarded as a vital factor. Failure to achieve convincing a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003